Dataset statistics
| Number of variables | 30 |
|---|---|
| Number of observations | 10296 |
| Missing cells | 39658 |
| Missing cells (%) | 12.8% |
| Duplicate rows | 29 |
| Duplicate rows (%) | 0.3% |
| Total size in memory | 2.3 MiB |
| Average record size in memory | 234.0 B |
Variable types
| Numeric | 15 |
|---|---|
| Text | 3 |
| Categorical | 10 |
| Boolean | 2 |
description_exists has constant value "True" | Constant |
| Dataset has 29 (0.3%) duplicate rows | Duplicates |
carvana_ad is highly overall correlated with condition and 12 other fields | High correlation |
condition is highly overall correlated with carvana_ad | High correlation |
drive is highly overall correlated with type | High correlation |
manufacturer is highly overall correlated with org_manuf | High correlation |
odometer is highly overall correlated with carvana_ad and 2 other fields | High correlation |
org_manuf is highly overall correlated with manufacturer | High correlation |
price is highly overall correlated with odometer and 1 other fields | High correlation |
tfidf_auto is highly overall correlated with carvana_ad | High correlation |
tfidf_car is highly overall correlated with carvana_ad | High correlation |
tfidf_credit is highly overall correlated with carvana_ad | High correlation |
tfidf_miles is highly overall correlated with carvana_ad | High correlation |
tfidf_new is highly overall correlated with carvana_ad | High correlation |
tfidf_power is highly overall correlated with carvana_ad | High correlation |
tfidf_rear is highly overall correlated with carvana_ad | High correlation |
tfidf_text is highly overall correlated with carvana_ad | High correlation |
tfidf_truck is highly overall correlated with carvana_ad | High correlation |
tfidf_vehicle is highly overall correlated with carvana_ad | High correlation |
transmission is highly overall correlated with carvana_ad | High correlation |
type is highly overall correlated with drive | High correlation |
year is highly overall correlated with odometer and 1 other fields | High correlation |
title_status is highly imbalanced (90.3%) | Imbalance |
fuel is highly imbalanced (61.8%) | Imbalance |
tfidf_auto has 1895 (18.4%) missing values | Missing |
model has 110 (1.1%) missing values | Missing |
tfidf_miles has 1895 (18.4%) missing values | Missing |
condition has 3861 (37.5%) missing values | Missing |
tfidf_power has 1895 (18.4%) missing values | Missing |
tfidf_new has 1895 (18.4%) missing values | Missing |
title_status has 174 (1.7%) missing values | Missing |
tfidf_vehicle has 1895 (18.4%) missing values | Missing |
lat has 111 (1.1%) missing values | Missing |
type has 2118 (20.6%) missing values | Missing |
tfidf_text has 1895 (18.4%) missing values | Missing |
org_manuf has 397 (3.9%) missing values | Missing |
tfidf_truck has 1895 (18.4%) missing values | Missing |
tfidf_credit has 1895 (18.4%) missing values | Missing |
cylinders has 4248 (41.3%) missing values | Missing |
drive has 3076 (29.9%) missing values | Missing |
tfidf_rear has 1895 (18.4%) missing values | Missing |
paint_color has 2867 (27.8%) missing values | Missing |
long has 111 (1.1%) missing values | Missing |
manufacturer has 3505 (34.0%) missing values | Missing |
tfidf_car has 1895 (18.4%) missing values | Missing |
tfidf_auto has 5296 (51.4%) zeros | Zeros |
tfidf_miles has 4015 (39.0%) zeros | Zeros |
tfidf_power has 4568 (44.4%) zeros | Zeros |
tfidf_new has 5151 (50.0%) zeros | Zeros |
tfidf_vehicle has 3702 (36.0%) zeros | Zeros |
tfidf_text has 4807 (46.7%) zeros | Zeros |
tfidf_truck has 6573 (63.8%) zeros | Zeros |
tfidf_credit has 5168 (50.2%) zeros | Zeros |
tfidf_rear has 5567 (54.1%) zeros | Zeros |
tfidf_car has 5227 (50.8%) zeros | Zeros |
Reproduction
| Analysis started | 2024-10-24 17:28:48.343554 |
|---|---|
| Analysis finished | 2024-10-24 17:29:39.978120 |
| Duration | 51.63 seconds |
| Software version | ydata-profiling vv4.10.0 |
| Download configuration | config.json |
tfidf_auto
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 3024 |
|---|---|
| Distinct (%) | 36.0% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.042166903 |
| Minimum | 0 |
|---|---|
| Maximum | 0.66149077 |
| Zeros | 5296 |
| Zeros (%) | 51.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.061440289 |
| 95-th percentile | 0.20988407 |
| Maximum | 0.66149077 |
| Range | 0.66149077 |
| Interquartile range (IQR) | 0.061440289 |
Descriptive statistics
| Standard deviation | 0.074599757 |
|---|---|
| Coefficient of variation (CV) | 1.7691543 |
| Kurtosis | 5.4051015 |
| Mean | 0.042166903 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.1797535 |
| Sum | 354.24415 |
| Variance | 0.0055651237 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5296 | |
| 0.0154114841 | 3 | < 0.1% |
| 0.04542635118 | 3 | < 0.1% |
| 0.2085570748 | 3 | < 0.1% |
| 0.0937732982 | 3 | < 0.1% |
| 0.1105257677 | 3 | < 0.1% |
| 0.1494165855 | 3 | < 0.1% |
| 0.1080258248 | 3 | < 0.1% |
| 0.1974122447 | 2 | < 0.1% |
| 0.1012856125 | 2 | < 0.1% |
| Other values (3014) | 3080 | |
| (Missing) | 1895 | 18.4% |
| Value | Count | Frequency (%) |
| 0 | 5296 | |
| 0.001298957247 | 1 | < 0.1% |
| 0.005423575805 | 2 | < 0.1% |
| 0.006002047607 | 1 | < 0.1% |
| 0.00618822919 | 1 | < 0.1% |
| 0.006866204083 | 2 | < 0.1% |
| 0.006943066934 | 2 | < 0.1% |
| 0.007040687998 | 1 | < 0.1% |
| 0.007072917092 | 1 | < 0.1% |
| 0.007148471259 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.6614907717 | 1 | |
| 0.5402635855 | 1 | |
| 0.527838005 | 1 | |
| 0.5143600559 | 1 | |
| 0.5037958612 | 1 | |
| 0.503396986 | 1 | |
| 0.481138777 | 1 | |
| 0.4803877725 | 1 | |
| 0.4794363711 | 1 | |
| 0.4793805561 | 1 |
price
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 2178 |
|---|---|
| Distinct (%) | 21.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19925.813 |
| Minimum | 2100 |
|---|---|
| Maximum | 120000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 2100 |
|---|---|
| 5-th percentile | 3750 |
| Q1 | 8250 |
| median | 16950 |
| Q3 | 28727.5 |
| 95-th percentile | 44995 |
| Maximum | 120000 |
| Range | 117900 |
| Interquartile range (IQR) | 20477.5 |
Descriptive statistics
| Standard deviation | 14243.454 |
|---|---|
| Coefficient of variation (CV) | 0.7148242 |
| Kurtosis | 2.4970546 |
| Mean | 19925.813 |
| Median Absolute Deviation (MAD) | 9550.5 |
| Skewness | 1.2570359 |
| Sum | 2.0515617 × 108 |
| Variance | 2.0287597 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6995 | 93 | 0.9% |
| 8995 | 92 | 0.9% |
| 6500 | 88 | 0.9% |
| 29990 | 84 | 0.8% |
| 26990 | 79 | 0.8% |
| 25990 | 76 | 0.7% |
| 7500 | 76 | 0.7% |
| 3500 | 74 | 0.7% |
| 4500 | 74 | 0.7% |
| 8500 | 71 | 0.7% |
| Other values (2168) | 9489 |
| Value | Count | Frequency (%) |
| 2100 | 11 | |
| 2195 | 1 | < 0.1% |
| 2200 | 14 | |
| 2250 | 5 | < 0.1% |
| 2299 | 2 | < 0.1% |
| 2300 | 9 | |
| 2335 | 1 | < 0.1% |
| 2350 | 3 | < 0.1% |
| 2388 | 1 | < 0.1% |
| 2400 | 13 |
| Value | Count | Frequency (%) |
| 120000 | 1 | < 0.1% |
| 111111 | 1 | < 0.1% |
| 109999 | 1 | < 0.1% |
| 106999 | 1 | < 0.1% |
| 105000 | 2 | |
| 100000 | 1 | < 0.1% |
| 97995 | 1 | < 0.1% |
| 95900 | 1 | < 0.1% |
| 95000 | 3 | |
| 94995 | 1 | < 0.1% |
model
Text
MISSING 
| Distinct | 3502 |
|---|---|
| Distinct (%) | 34.4% |
| Missing | 110 |
| Missing (%) | 1.1% |
| Memory size | 160.9 KiB |
Length
| Max length | 178 |
|---|---|
| Median length | 161 |
| Mean length | 12.359611 |
| Min length | 1 |
Characters and Unicode
| Total characters | 125895 |
|---|---|
| Distinct characters | 79 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 2247 ? |
|---|---|
| Unique (%) | 22.1% |
Sample
| 1st row | 3 series |
|---|---|
| 2nd row | ranger supercrew lariat |
| 3rd row | romeo stelvio ti sport |
| 4th row | s320 |
| 5th row | corolla |
| Value | Count | Frequency (%) |
| sport | 597 | 2.6% |
| 4d | 583 | 2.6% |
| 1500 | 562 | 2.5% |
| sedan | 468 | 2.1% |
| cab | 430 | 1.9% |
| silverado | 404 | 1.8% |
| f-150 | 257 | 1.1% |
| grand | 228 | 1.0% |
| super | 226 | 1.0% |
| 4x4 | 225 | 1.0% |
| Other values (1938) | 18828 |
Most occurring characters
| Value | Count | Frequency (%) |
| 12624 | 10.0% | |
| e | 9912 | 7.9% |
| r | 9108 | 7.2% |
| a | 9041 | 7.2% |
| s | 7025 | 5.6% |
| t | 6547 | 5.2% |
| i | 5938 | 4.7% |
| o | 5725 | 4.5% |
| l | 5227 | 4.2% |
| c | 5192 | 4.1% |
| Other values (69) | 49556 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 125895 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 12624 | 10.0% | |
| e | 9912 | 7.9% |
| r | 9108 | 7.2% |
| a | 9041 | 7.2% |
| s | 7025 | 5.6% |
| t | 6547 | 5.2% |
| i | 5938 | 4.7% |
| o | 5725 | 4.5% |
| l | 5227 | 4.2% |
| c | 5192 | 4.1% |
| Other values (69) | 49556 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 125895 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 12624 | 10.0% | |
| e | 9912 | 7.9% |
| r | 9108 | 7.2% |
| a | 9041 | 7.2% |
| s | 7025 | 5.6% |
| t | 6547 | 5.2% |
| i | 5938 | 4.7% |
| o | 5725 | 4.5% |
| l | 5227 | 4.2% |
| c | 5192 | 4.1% |
| Other values (69) | 49556 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 125895 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 12624 | 10.0% | |
| e | 9912 | 7.9% |
| r | 9108 | 7.2% |
| a | 9041 | 7.2% |
| s | 7025 | 5.6% |
| t | 6547 | 5.2% |
| i | 5938 | 4.7% |
| o | 5725 | 4.5% |
| l | 5227 | 4.2% |
| c | 5192 | 4.1% |
| Other values (69) | 49556 |
tfidf_miles
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 4279 |
|---|---|
| Distinct (%) | 50.9% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.044819933 |
| Minimum | 0 |
|---|---|
| Maximum | 0.64007722 |
| Zeros | 4015 |
| Zeros (%) | 39.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.0096411751 |
| Q3 | 0.068726804 |
| 95-th percentile | 0.18419527 |
| Maximum | 0.64007722 |
| Range | 0.64007722 |
| Interquartile range (IQR) | 0.068726804 |
Descriptive statistics
| Standard deviation | 0.068943072 |
|---|---|
| Coefficient of variation (CV) | 1.5382234 |
| Kurtosis | 6.1365082 |
| Mean | 0.044819933 |
| Median Absolute Deviation (MAD) | 0.0096411751 |
| Skewness | 2.1398578 |
| Sum | 376.53226 |
| Variance | 0.0047531472 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 4015 | |
| 0.02557244391 | 3 | < 0.1% |
| 0.04671041119 | 3 | < 0.1% |
| 0.02298754203 | 3 | < 0.1% |
| 0.04360660947 | 3 | < 0.1% |
| 0.02796926203 | 3 | < 0.1% |
| 0.02572036067 | 3 | < 0.1% |
| 0.02600440516 | 3 | < 0.1% |
| 0.07558347187 | 3 | < 0.1% |
| 0.02643624384 | 3 | < 0.1% |
| Other values (4269) | 4359 | |
| (Missing) | 1895 |
| Value | Count | Frequency (%) |
| 0 | 4015 | |
| 0.001080646981 | 1 | < 0.1% |
| 0.004472179671 | 1 | < 0.1% |
| 0.004487536387 | 1 | < 0.1% |
| 0.004489532814 | 1 | < 0.1% |
| 0.004491282989 | 1 | < 0.1% |
| 0.004493155209 | 1 | < 0.1% |
| 0.004495259415 | 1 | < 0.1% |
| 0.004498209133 | 1 | < 0.1% |
| 0.004512058295 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.6400772192 | 1 | |
| 0.5541132506 | 1 | |
| 0.5510313243 | 1 | |
| 0.5437392713 | 1 | |
| 0.5261941696 | 1 | |
| 0.4997333429 | 1 | |
| 0.4973328746 | 1 | |
| 0.4854199203 | 1 | |
| 0.47155349 | 1 | |
| 0.4697453658 | 1 |
condition
Categorical
HIGH CORRELATION  MISSING 
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 3861 |
| Missing (%) | 37.5% |
| Memory size | 160.9 KiB |
| -1 | |
|---|---|
| 1 | |
| 2 | |
| -2 | 113 |
| -3 | 12 |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 1.5420357 |
| Min length | 1 |
Characters and Unicode
| Total characters | 9923 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | -1 |
|---|---|
| 2nd row | -1 |
| 3rd row | 1 |
| 4th row | -1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| -1 | 3363 | |
| 1 | 2433 | |
| 2 | 514 | 5.0% |
| -2 | 113 | 1.1% |
| -3 | 12 | 0.1% |
| (Missing) | 3861 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 5796 | |
| 2 | 627 | 9.7% |
| 3 | 12 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 5796 | |
| - | 3488 | |
| 2 | 627 | 6.3% |
| 3 | 12 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9923 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 5796 | |
| - | 3488 | |
| 2 | 627 | 6.3% |
| 3 | 12 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9923 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 5796 | |
| - | 3488 | |
| 2 | 627 | 6.3% |
| 3 | 12 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9923 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 5796 | |
| - | 3488 | |
| 2 | 627 | 6.3% |
| 3 | 12 | 0.1% |
state
Text
| Distinct | 51 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 160.9 KiB |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 20592 |
|---|---|
| Distinct characters | 24 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | va |
|---|---|
| 2nd row | mi |
| 3rd row | ks |
| 4th row | az |
| 5th row | ca |
| Value | Count | Frequency (%) |
| ca | 1199 | 11.6% |
| fl | 715 | 6.9% |
| tx | 573 | 5.6% |
| ny | 458 | 4.4% |
| mi | 441 | 4.3% |
| oh | 439 | 4.3% |
| or | 355 | 3.4% |
| pa | 321 | 3.1% |
| co | 318 | 3.1% |
| nc | 312 | 3.0% |
| Other values (41) | 5165 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 3208 | |
| c | 2187 | |
| n | 1926 | 9.4% |
| i | 1644 | 8.0% |
| m | 1447 | 7.0% |
| o | 1355 | 6.6% |
| t | 1236 | 6.0% |
| l | 1196 | 5.8% |
| f | 715 | 3.5% |
| w | 635 | 3.1% |
| Other values (14) | 5043 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 20592 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 3208 | |
| c | 2187 | |
| n | 1926 | 9.4% |
| i | 1644 | 8.0% |
| m | 1447 | 7.0% |
| o | 1355 | 6.6% |
| t | 1236 | 6.0% |
| l | 1196 | 5.8% |
| f | 715 | 3.5% |
| w | 635 | 3.1% |
| Other values (14) | 5043 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 20592 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 3208 | |
| c | 2187 | |
| n | 1926 | 9.4% |
| i | 1644 | 8.0% |
| m | 1447 | 7.0% |
| o | 1355 | 6.6% |
| t | 1236 | 6.0% |
| l | 1196 | 5.8% |
| f | 715 | 3.5% |
| w | 635 | 3.1% |
| Other values (14) | 5043 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 20592 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 3208 | |
| c | 2187 | |
| n | 1926 | 9.4% |
| i | 1644 | 8.0% |
| m | 1447 | 7.0% |
| o | 1355 | 6.6% |
| t | 1236 | 6.0% |
| l | 1196 | 5.8% |
| f | 715 | 3.5% |
| w | 635 | 3.1% |
| Other values (14) | 5043 |
tfidf_power
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 3693 |
|---|---|
| Distinct (%) | 44.0% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.05847928 |
| Minimum | 0 |
|---|---|
| Maximum | 0.59413495 |
| Zeros | 4568 |
| Zeros (%) | 44.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.099639715 |
| 95-th percentile | 0.24595528 |
| Maximum | 0.59413495 |
| Range | 0.59413495 |
| Interquartile range (IQR) | 0.099639715 |
Descriptive statistics
| Standard deviation | 0.090608517 |
|---|---|
| Coefficient of variation (CV) | 1.5494123 |
| Kurtosis | 3.7034253 |
| Mean | 0.05847928 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.8698024 |
| Sum | 491.28443 |
| Variance | 0.0082099034 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 4568 | |
| 0.02997977763 | 6 | 0.1% |
| 0.05505375296 | 4 | < 0.1% |
| 0.1228374255 | 4 | < 0.1% |
| 0.0155545393 | 3 | < 0.1% |
| 0.1923413504 | 3 | < 0.1% |
| 0.01612965963 | 3 | < 0.1% |
| 0.02744034279 | 3 | < 0.1% |
| 0.1322471159 | 3 | < 0.1% |
| 0.1475943979 | 3 | < 0.1% |
| Other values (3683) | 3801 | |
| (Missing) | 1895 |
| Value | Count | Frequency (%) |
| 0 | 4568 | |
| 0.004500882071 | 1 | < 0.1% |
| 0.004544849028 | 1 | < 0.1% |
| 0.004568178518 | 1 | < 0.1% |
| 0.004578363538 | 1 | < 0.1% |
| 0.004608579438 | 1 | < 0.1% |
| 0.004612099701 | 1 | < 0.1% |
| 0.004625619689 | 1 | < 0.1% |
| 0.004643681413 | 1 | < 0.1% |
| 0.004775087211 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.5941349492 | 1 | |
| 0.5763887728 | 1 | |
| 0.5628034591 | 1 | |
| 0.559433739 | 1 | |
| 0.5517589513 | 1 | |
| 0.5368876003 | 1 | |
| 0.5368273107 | 1 | |
| 0.5313440329 | 1 | |
| 0.5287529745 | 1 | |
| 0.5123783217 | 1 |
tfidf_new
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 3148 |
|---|---|
| Distinct (%) | 37.5% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.049130921 |
| Minimum | 0 |
|---|---|
| Maximum | 0.91233722 |
| Zeros | 5151 |
| Zeros (%) | 50.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.04811926 |
| 95-th percentile | 0.25414037 |
| Maximum | 0.91233722 |
| Range | 0.91233722 |
| Interquartile range (IQR) | 0.04811926 |
Descriptive statistics
| Standard deviation | 0.10372671 |
|---|---|
| Coefficient of variation (CV) | 2.1112308 |
| Kurtosis | 14.168765 |
| Mean | 0.049130921 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.3565078 |
| Sum | 412.74887 |
| Variance | 0.010759231 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5151 | |
| 0.04482418259 | 4 | < 0.1% |
| 0.03013048127 | 3 | < 0.1% |
| 0.03096911277 | 3 | < 0.1% |
| 0.02995720209 | 3 | < 0.1% |
| 0.01765745591 | 3 | < 0.1% |
| 0.3539721059 | 3 | < 0.1% |
| 0.01501974171 | 3 | < 0.1% |
| 0.175638604 | 3 | < 0.1% |
| 0.01719898031 | 3 | < 0.1% |
| Other values (3138) | 3222 | |
| (Missing) | 1895 | 18.4% |
| Value | Count | Frequency (%) |
| 0 | 5151 | |
| 0.006691672989 | 2 | < 0.1% |
| 0.007903321665 | 1 | < 0.1% |
| 0.008529143579 | 1 | < 0.1% |
| 0.008817081325 | 1 | < 0.1% |
| 0.009297964737 | 1 | < 0.1% |
| 0.009390334329 | 1 | < 0.1% |
| 0.009591194456 | 2 | < 0.1% |
| 0.01005527864 | 1 | < 0.1% |
| 0.01008939618 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.912337222 | 1 | |
| 0.893985339 | 1 | |
| 0.8799837274 | 1 | |
| 0.855717183 | 1 | |
| 0.8442472452 | 1 | |
| 0.8210144641 | 1 | |
| 0.8153993827 | 1 | |
| 0.8117790523 | 1 | |
| 0.7967300018 | 1 | |
| 0.7950674542 | 1 |
title_status
Categorical
IMBALANCE  MISSING 
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 174 |
| Missing (%) | 1.7% |
| Memory size | 160.9 KiB |
| clean | |
|---|---|
| rebuilt | 183 |
| salvage | 90 |
| lien | 30 |
| missing | 15 |
Length
| Max length | 10 |
|---|---|
| Median length | 5 |
| Mean length | 5.0549299 |
| Min length | 4 |
Characters and Unicode
| Total characters | 51166 |
|---|---|
| Distinct characters | 18 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | clean |
|---|---|
| 2nd row | clean |
| 3rd row | clean |
| 4th row | clean |
| 5th row | clean |
Common Values
| Value | Count | Frequency (%) |
| clean | 9802 | |
| rebuilt | 183 | 1.8% |
| salvage | 90 | 0.9% |
| lien | 30 | 0.3% |
| missing | 15 | 0.1% |
| parts only | 2 | < 0.1% |
| (Missing) | 174 | 1.7% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| clean | 9802 | |
| rebuilt | 183 | 1.8% |
| salvage | 90 | 0.9% |
| lien | 30 | 0.3% |
| missing | 15 | 0.1% |
| parts | 2 | < 0.1% |
| only | 2 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 10107 | |
| e | 10105 | |
| a | 9984 | |
| n | 9849 | |
| c | 9802 | |
| i | 243 | 0.5% |
| r | 185 | 0.4% |
| t | 185 | 0.4% |
| u | 183 | 0.4% |
| b | 183 | 0.4% |
| Other values (8) | 340 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 51166 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| l | 10107 | |
| e | 10105 | |
| a | 9984 | |
| n | 9849 | |
| c | 9802 | |
| i | 243 | 0.5% |
| r | 185 | 0.4% |
| t | 185 | 0.4% |
| u | 183 | 0.4% |
| b | 183 | 0.4% |
| Other values (8) | 340 | 0.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 51166 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| l | 10107 | |
| e | 10105 | |
| a | 9984 | |
| n | 9849 | |
| c | 9802 | |
| i | 243 | 0.5% |
| r | 185 | 0.4% |
| t | 185 | 0.4% |
| u | 183 | 0.4% |
| b | 183 | 0.4% |
| Other values (8) | 340 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 51166 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| l | 10107 | |
| e | 10105 | |
| a | 9984 | |
| n | 9849 | |
| c | 9802 | |
| i | 243 | 0.5% |
| r | 185 | 0.4% |
| t | 185 | 0.4% |
| u | 183 | 0.4% |
| b | 183 | 0.4% |
| Other values (8) | 340 | 0.7% |
tfidf_vehicle
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 4497 |
|---|---|
| Distinct (%) | 53.5% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.05705624 |
| Minimum | 0 |
|---|---|
| Maximum | 0.78422321 |
| Zeros | 3702 |
| Zeros (%) | 36.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.023303367 |
| Q3 | 0.086514163 |
| 95-th percentile | 0.22003102 |
| Maximum | 0.78422321 |
| Range | 0.78422321 |
| Interquartile range (IQR) | 0.086514163 |
Descriptive statistics
| Standard deviation | 0.084877793 |
|---|---|
| Coefficient of variation (CV) | 1.4876163 |
| Kurtosis | 7.0588055 |
| Mean | 0.05705624 |
| Median Absolute Deviation (MAD) | 0.023303367 |
| Skewness | 2.3728722 |
| Sum | 479.32947 |
| Variance | 0.0072042397 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3702 | |
| 0.1592240969 | 6 | 0.1% |
| 0.01812210289 | 4 | < 0.1% |
| 0.3167593354 | 4 | < 0.1% |
| 0.08072379225 | 3 | < 0.1% |
| 0.04985596021 | 3 | < 0.1% |
| 0.04172056326 | 3 | < 0.1% |
| 0.01211148687 | 3 | < 0.1% |
| 0.08849096061 | 3 | < 0.1% |
| 0.0407401709 | 3 | < 0.1% |
| Other values (4487) | 4667 | |
| (Missing) | 1895 |
| Value | Count | Frequency (%) |
| 0 | 3702 | |
| 0.00409448288 | 1 | < 0.1% |
| 0.005624156937 | 1 | < 0.1% |
| 0.005836919751 | 1 | < 0.1% |
| 0.00653643387 | 1 | < 0.1% |
| 0.006560054913 | 2 | < 0.1% |
| 0.006866828549 | 1 | < 0.1% |
| 0.007592892725 | 1 | < 0.1% |
| 0.007835780162 | 1 | < 0.1% |
| 0.008041819881 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.7842232101 | 1 | |
| 0.5128763411 | 1 | |
| 0.4927697212 | 1 | |
| 0.4927295434 | 1 | |
| 0.4925890991 | 1 | |
| 0.4825611542 | 1 | |
| 0.4822950945 | 1 | |
| 0.480478726 | 1 | |
| 0.4795482042 | 1 | |
| 0.4795089707 | 1 |
lat
Real number (ℝ)
MISSING 
| Distinct | 5063 |
|---|---|
| Distinct (%) | 49.7% |
| Missing | 111 |
| Missing (%) | 1.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.527681 |
| Minimum | 19.5981 |
|---|---|
| Maximum | 77.86064 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 19.5981 |
|---|---|
| 5-th percentile | 28.3019 |
| Q1 | 34.445746 |
| median | 39.3 |
| Q3 | 42.37853 |
| 95-th percentile | 47.054557 |
| Maximum | 77.86064 |
| Range | 58.26254 |
| Interquartile range (IQR) | 7.932784 |
Descriptive statistics
| Standard deviation | 5.7883825 |
|---|---|
| Coefficient of variation (CV) | 0.15023958 |
| Kurtosis | 1.3887709 |
| Mean | 38.527681 |
| Median Absolute Deviation (MAD) | 3.702421 |
| Skewness | 0.067027135 |
| Sum | 392404.43 |
| Variance | 33.505372 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 33.779214 | 101 | 1.0% |
| 40.468785 | 92 | 0.9% |
| 43.1824 | 79 | 0.8% |
| 33.7865 | 77 | 0.7% |
| 27.26977 | 35 | 0.3% |
| 47.1991 | 34 | 0.3% |
| 46.234838 | 33 | 0.3% |
| 47.696062 | 33 | 0.3% |
| 47.81247 | 32 | 0.3% |
| 36.17 | 31 | 0.3% |
| Other values (5053) | 9638 | |
| (Missing) | 111 | 1.1% |
| Value | Count | Frequency (%) |
| 19.5981 | 1 | < 0.1% |
| 19.641782 | 1 | < 0.1% |
| 19.646976 | 1 | < 0.1% |
| 19.719349 | 1 | < 0.1% |
| 20.77208 | 1 | < 0.1% |
| 20.877965 | 1 | < 0.1% |
| 20.886756 | 3 | |
| 20.889768 | 1 | < 0.1% |
| 20.89258 | 1 | < 0.1% |
| 20.9174 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 77.86064 | 1 | < 0.1% |
| 64.93614 | 1 | < 0.1% |
| 64.8378 | 2 | |
| 64.81552 | 1 | < 0.1% |
| 64.7805 | 1 | < 0.1% |
| 64.0378 | 1 | < 0.1% |
| 61.605649 | 1 | < 0.1% |
| 61.573915 | 1 | < 0.1% |
| 61.572407 | 2 | |
| 61.56939 | 3 |
transmission
Categorical
HIGH CORRELATION 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 47 |
| Missing (%) | 0.5% |
| Memory size | 160.9 KiB |
| automatic | |
|---|---|
| other | |
| manual | 612 |
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 8.1113279 |
| Min length | 5 |
Characters and Unicode
| Total characters | 83133 |
|---|---|
| Distinct characters | 12 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | automatic |
|---|---|
| 2nd row | other |
| 3rd row | other |
| 4th row | automatic |
| 5th row | automatic |
Common Values
| Value | Count | Frequency (%) |
| automatic | 7819 | |
| other | 1818 | 17.7% |
| manual | 612 | 5.9% |
| (Missing) | 47 | 0.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| automatic | 7819 | |
| other | 1818 | 17.7% |
| manual | 612 | 6.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 17456 | |
| a | 16862 | |
| o | 9637 | |
| u | 8431 | |
| m | 8431 | |
| i | 7819 | |
| c | 7819 | |
| h | 1818 | 2.2% |
| e | 1818 | 2.2% |
| r | 1818 | 2.2% |
| Other values (2) | 1224 | 1.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 83133 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 17456 | |
| a | 16862 | |
| o | 9637 | |
| u | 8431 | |
| m | 8431 | |
| i | 7819 | |
| c | 7819 | |
| h | 1818 | 2.2% |
| e | 1818 | 2.2% |
| r | 1818 | 2.2% |
| Other values (2) | 1224 | 1.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 83133 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 17456 | |
| a | 16862 | |
| o | 9637 | |
| u | 8431 | |
| m | 8431 | |
| i | 7819 | |
| c | 7819 | |
| h | 1818 | 2.2% |
| e | 1818 | 2.2% |
| r | 1818 | 2.2% |
| Other values (2) | 1224 | 1.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 83133 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 17456 | |
| a | 16862 | |
| o | 9637 | |
| u | 8431 | |
| m | 8431 | |
| i | 7819 | |
| c | 7819 | |
| h | 1818 | 2.2% |
| e | 1818 | 2.2% |
| r | 1818 | 2.2% |
| Other values (2) | 1224 | 1.5% |
type
Categorical
HIGH CORRELATION  MISSING 
| Distinct | 13 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 2118 |
| Missing (%) | 20.6% |
| Memory size | 160.9 KiB |
| sedan | |
|---|---|
| SUV | |
| pickup | |
| truck | |
| other | |
| Other values (8) |
Length
| Max length | 11 |
|---|---|
| Median length | 5 |
| Mean length | 5.0606505 |
| Min length | 3 |
Characters and Unicode
| Total characters | 41386 |
|---|---|
| Distinct characters | 25 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | convertible |
|---|---|
| 2nd row | pickup |
| 3rd row | hatchback |
| 4th row | SUV |
| 5th row | convertible |
Common Values
| Value | Count | Frequency (%) |
| sedan | 2087 | |
| SUV | 1844 | |
| pickup | 1166 | |
| truck | 784 | 7.6% |
| other | 536 | 5.2% |
| coupe | 507 | 4.9% |
| hatchback | 442 | 4.3% |
| wagon | 266 | 2.6% |
| convertible | 217 | 2.1% |
| van | 195 | 1.9% |
| Other values (3) | 134 | 1.3% |
| (Missing) | 2118 |
Length
| Value | Count | Frequency (%) |
| sedan | 2087 | |
| suv | 1844 | |
| pickup | 1166 | |
| truck | 784 | 9.6% |
| other | 536 | 6.6% |
| coupe | 507 | 6.2% |
| hatchback | 442 | 5.4% |
| wagon | 266 | 3.3% |
| convertible | 217 | 2.7% |
| van | 195 | 2.4% |
| Other values (3) | 134 | 1.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 3564 | 8.6% |
| c | 3558 | 8.6% |
| a | 3555 | 8.6% |
| n | 2993 | 7.2% |
| p | 2839 | 6.9% |
| u | 2468 | 6.0% |
| k | 2392 | 5.8% |
| s | 2098 | 5.1% |
| d | 2096 | 5.1% |
| t | 1979 | 4.8% |
| Other values (15) | 13844 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 41386 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 3564 | 8.6% |
| c | 3558 | 8.6% |
| a | 3555 | 8.6% |
| n | 2993 | 7.2% |
| p | 2839 | 6.9% |
| u | 2468 | 6.0% |
| k | 2392 | 5.8% |
| s | 2098 | 5.1% |
| d | 2096 | 5.1% |
| t | 1979 | 4.8% |
| Other values (15) | 13844 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 41386 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 3564 | 8.6% |
| c | 3558 | 8.6% |
| a | 3555 | 8.6% |
| n | 2993 | 7.2% |
| p | 2839 | 6.9% |
| u | 2468 | 6.0% |
| k | 2392 | 5.8% |
| s | 2098 | 5.1% |
| d | 2096 | 5.1% |
| t | 1979 | 4.8% |
| Other values (15) | 13844 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 41386 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 3564 | 8.6% |
| c | 3558 | 8.6% |
| a | 3555 | 8.6% |
| n | 2993 | 7.2% |
| p | 2839 | 6.9% |
| u | 2468 | 6.0% |
| k | 2392 | 5.8% |
| s | 2098 | 5.1% |
| d | 2096 | 5.1% |
| t | 1979 | 4.8% |
| Other values (15) | 13844 |
tfidf_text
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 3467 |
|---|---|
| Distinct (%) | 41.3% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.040443381 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 4807 |
| Zeros (%) | 46.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.070583751 |
| 95-th percentile | 0.16262274 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.070583751 |
Descriptive statistics
| Standard deviation | 0.061215884 |
|---|---|
| Coefficient of variation (CV) | 1.5136194 |
| Kurtosis | 11.351614 |
| Mean | 0.040443381 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.1949981 |
| Sum | 339.76484 |
| Variance | 0.0037473844 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 4807 | |
| 0.1135405116 | 4 | < 0.1% |
| 0.0633336666 | 4 | < 0.1% |
| 0.08635350668 | 3 | < 0.1% |
| 0.07838034789 | 3 | < 0.1% |
| 0.05154347547 | 3 | < 0.1% |
| 0.08026611417 | 3 | < 0.1% |
| 0.08751470076 | 3 | < 0.1% |
| 0.04232758613 | 3 | < 0.1% |
| 0.06001624309 | 3 | < 0.1% |
| Other values (3457) | 3565 | |
| (Missing) | 1895 | 18.4% |
| Value | Count | Frequency (%) |
| 0 | 4807 | |
| 0.001192460091 | 1 | < 0.1% |
| 0.002727602427 | 1 | < 0.1% |
| 0.006422821511 | 1 | < 0.1% |
| 0.006535357797 | 1 | < 0.1% |
| 0.006589449878 | 1 | < 0.1% |
| 0.00674613219 | 1 | < 0.1% |
| 0.006765997612 | 1 | < 0.1% |
| 0.007111321212 | 1 | < 0.1% |
| 0.007628716632 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 0.6114419968 | 1 | |
| 0.4776100548 | 1 | |
| 0.4635352866 | 1 | |
| 0.4608938853 | 1 | |
| 0.4487710757 | 1 | |
| 0.4449453972 | 1 | |
| 0.4322772687 | 1 | |
| 0.4116430558 | 1 | |
| 0.4092394588 | 1 |
org_manuf
Categorical
HIGH CORRELATION  MISSING 
| Distinct | 40 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 397 |
| Missing (%) | 3.9% |
| Memory size | 160.9 KiB |
| ford | |
|---|---|
| chevrolet | |
| toyota | |
| honda | |
| jeep | 430 |
| Other values (35) |
Length
| Max length | 15 |
|---|---|
| Median length | 12 |
| Mean length | 5.7937165 |
| Min length | 3 |
Characters and Unicode
| Total characters | 57352 |
|---|---|
| Distinct characters | 26 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | bmw |
|---|---|
| 2nd row | ford |
| 3rd row | alfa-romeo |
| 4th row | mercedes-benz |
| 5th row | toyota |
Common Values
| Value | Count | Frequency (%) |
| ford | 1784 | |
| chevrolet | 1305 | |
| toyota | 864 | 8.4% |
| honda | 512 | 5.0% |
| jeep | 430 | 4.2% |
| ram | 422 | 4.1% |
| nissan | 419 | 4.1% |
| gmc | 403 | 3.9% |
| bmw | 358 | 3.5% |
| dodge | 294 | 2.9% |
| Other values (30) | 3108 | |
| (Missing) | 397 | 3.9% |
Length
| Value | Count | Frequency (%) |
| ford | 1784 | |
| chevrolet | 1305 | |
| toyota | 864 | 8.7% |
| honda | 512 | 5.2% |
| jeep | 430 | 4.3% |
| ram | 422 | 4.3% |
| nissan | 419 | 4.2% |
| gmc | 403 | 4.1% |
| bmw | 358 | 3.6% |
| dodge | 294 | 3.0% |
| Other values (30) | 3108 |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 6331 | 11.0% |
| e | 5668 | 9.9% |
| r | 4756 | 8.3% |
| a | 4570 | 8.0% |
| d | 3930 | 6.9% |
| t | 3365 | 5.9% |
| c | 2998 | 5.2% |
| n | 2696 | 4.7% |
| l | 2595 | 4.5% |
| i | 2332 | 4.1% |
| Other values (16) | 18111 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 57352 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| o | 6331 | 11.0% |
| e | 5668 | 9.9% |
| r | 4756 | 8.3% |
| a | 4570 | 8.0% |
| d | 3930 | 6.9% |
| t | 3365 | 5.9% |
| c | 2998 | 5.2% |
| n | 2696 | 4.7% |
| l | 2595 | 4.5% |
| i | 2332 | 4.1% |
| Other values (16) | 18111 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 57352 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| o | 6331 | 11.0% |
| e | 5668 | 9.9% |
| r | 4756 | 8.3% |
| a | 4570 | 8.0% |
| d | 3930 | 6.9% |
| t | 3365 | 5.9% |
| c | 2998 | 5.2% |
| n | 2696 | 4.7% |
| l | 2595 | 4.5% |
| i | 2332 | 4.1% |
| Other values (16) | 18111 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 57352 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| o | 6331 | 11.0% |
| e | 5668 | 9.9% |
| r | 4756 | 8.3% |
| a | 4570 | 8.0% |
| d | 3930 | 6.9% |
| t | 3365 | 5.9% |
| c | 2998 | 5.2% |
| n | 2696 | 4.7% |
| l | 2595 | 4.5% |
| i | 2332 | 4.1% |
| Other values (16) | 18111 |
year
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 92 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 23 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2011.1923 |
| Minimum | 1900 |
|---|---|
| Maximum | 2021 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 1900 |
|---|---|
| 5-th percentile | 1998 |
| Q1 | 2008 |
| median | 2013 |
| Q3 | 2017 |
| 95-th percentile | 2019.4 |
| Maximum | 2021 |
| Range | 121 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 9.5318846 |
|---|---|
| Coefficient of variation (CV) | 0.0047394199 |
| Kurtosis | 20.913912 |
| Mean | 2011.1923 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -3.6897563 |
| Sum | 20660978 |
| Variance | 90.856825 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2018 | 895 | 8.7% |
| 2017 | 846 | 8.2% |
| 2014 | 742 | 7.2% |
| 2015 | 740 | 7.2% |
| 2013 | 733 | 7.1% |
| 2016 | 726 | 7.1% |
| 2019 | 610 | 5.9% |
| 2012 | 555 | 5.4% |
| 2011 | 525 | 5.1% |
| 2020 | 478 | 4.6% |
| Other values (82) | 3423 |
| Value | Count | Frequency (%) |
| 1900 | 1 | |
| 1916 | 1 | |
| 1921 | 2 | |
| 1923 | 1 | |
| 1926 | 1 | |
| 1927 | 1 | |
| 1928 | 1 | |
| 1929 | 2 | |
| 1932 | 1 | |
| 1933 | 1 |
| Value | Count | Frequency (%) |
| 2021 | 36 | 0.3% |
| 2020 | 478 | |
| 2019 | 610 | |
| 2018 | 895 | |
| 2017 | 846 | |
| 2016 | 726 | |
| 2015 | 740 | |
| 2014 | 742 | |
| 2013 | 733 | |
| 2012 | 555 |
tfidf_truck
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 1723 |
|---|---|
| Distinct (%) | 20.5% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.042502364 |
| Minimum | 0 |
|---|---|
| Maximum | 0.85359411 |
| Zeros | 6573 |
| Zeros (%) | 63.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0.27592258 |
| Maximum | 0.85359411 |
| Range | 0.85359411 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.11931126 |
|---|---|
| Coefficient of variation (CV) | 2.8071675 |
| Kurtosis | 16.730696 |
| Mean | 0.042502364 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.8893696 |
| Sum | 357.06236 |
| Variance | 0.014235176 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 6573 | |
| 0.02115039754 | 6 | 0.1% |
| 0.1181588295 | 3 | < 0.1% |
| 0.1735436594 | 3 | < 0.1% |
| 0.07947088907 | 3 | < 0.1% |
| 0.1410554583 | 3 | < 0.1% |
| 0.7894767623 | 3 | < 0.1% |
| 0.7294941727 | 3 | < 0.1% |
| 0.01930582874 | 3 | < 0.1% |
| 0.7143425475 | 3 | < 0.1% |
| Other values (1713) | 1798 | 17.5% |
| (Missing) | 1895 | 18.4% |
| Value | Count | Frequency (%) |
| 0 | 6573 | |
| 0.006162356024 | 1 | < 0.1% |
| 0.00618037779 | 1 | < 0.1% |
| 0.006404506296 | 1 | < 0.1% |
| 0.006412680234 | 1 | < 0.1% |
| 0.006432176253 | 1 | < 0.1% |
| 0.006445597622 | 1 | < 0.1% |
| 0.006459968457 | 1 | < 0.1% |
| 0.006471531092 | 1 | < 0.1% |
| 0.006507569428 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.8535941059 | 1 | < 0.1% |
| 0.8258080166 | 1 | < 0.1% |
| 0.797586157 | 1 | < 0.1% |
| 0.7894767623 | 3 | |
| 0.7877936592 | 1 | < 0.1% |
| 0.7852018309 | 1 | < 0.1% |
| 0.7782061692 | 1 | < 0.1% |
| 0.7771934359 | 2 | |
| 0.7761335711 | 2 | |
| 0.7737457025 | 1 | < 0.1% |
tfidf_credit
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 3093 |
|---|---|
| Distinct (%) | 36.8% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.045251695 |
| Minimum | 0 |
|---|---|
| Maximum | 0.70253508 |
| Zeros | 5168 |
| Zeros (%) | 50.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.069527976 |
| 95-th percentile | 0.21285 |
| Maximum | 0.70253508 |
| Range | 0.70253508 |
| Interquartile range (IQR) | 0.069527976 |
Descriptive statistics
| Standard deviation | 0.079613984 |
|---|---|
| Coefficient of variation (CV) | 1.7593592 |
| Kurtosis | 7.6256345 |
| Mean | 0.045251695 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.4290397 |
| Sum | 380.15949 |
| Variance | 0.0063383864 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5168 | |
| 0.1641678138 | 6 | 0.1% |
| 0.03014716913 | 4 | < 0.1% |
| 0.2690607229 | 4 | < 0.1% |
| 0.1317855871 | 3 | < 0.1% |
| 0.04585700264 | 3 | < 0.1% |
| 0.05083579423 | 3 | < 0.1% |
| 0.0327790753 | 3 | < 0.1% |
| 0.3014343525 | 3 | < 0.1% |
| 0.1948055302 | 3 | < 0.1% |
| Other values (3083) | 3201 | |
| (Missing) | 1895 | 18.4% |
| Value | Count | Frequency (%) |
| 0 | 5168 | |
| 0.002896921654 | 1 | < 0.1% |
| 0.004415684614 | 1 | < 0.1% |
| 0.007646757924 | 1 | < 0.1% |
| 0.008482220306 | 1 | < 0.1% |
| 0.009528079568 | 1 | < 0.1% |
| 0.009969638014 | 1 | < 0.1% |
| 0.01002152241 | 1 | < 0.1% |
| 0.01075773294 | 1 | < 0.1% |
| 0.0111376256 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.7025350754 | 1 | |
| 0.6954358808 | 2 | |
| 0.6537140918 | 1 | |
| 0.6009980876 | 1 | |
| 0.5992340727 | 1 | |
| 0.5950281433 | 1 | |
| 0.5356420447 | 1 | |
| 0.5351164781 | 1 | |
| 0.5225999558 | 1 | |
| 0.5122655147 | 1 |
carvana_ad
Boolean
HIGH CORRELATION 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 90.5 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 8401 | |
| True | 1895 | 18.4% |
cylinders
Categorical
MISSING 
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 4248 |
| Missing (%) | 41.3% |
| Memory size | 160.9 KiB |
| 6 cylinders | |
|---|---|
| 4 cylinders | |
| 8 cylinders | |
| 5 cylinders | 45 |
| 10 cylinders | 40 |
| Other values (3) | 52 |
Length
| Max length | 12 |
|---|---|
| Median length | 11 |
| Mean length | 10.974041 |
| Min length | 5 |
Characters and Unicode
| Total characters | 66371 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 6 cylinders |
|---|---|
| 2nd row | 8 cylinders |
| 3rd row | 8 cylinders |
| 4th row | 4 cylinders |
| 5th row | 4 cylinders |
Common Values
| Value | Count | Frequency (%) |
| 6 cylinders | 2296 | |
| 4 cylinders | 1842 | |
| 8 cylinders | 1773 | |
| 5 cylinders | 45 | 0.4% |
| 10 cylinders | 40 | 0.4% |
| other | 34 | 0.3% |
| 3 cylinders | 11 | 0.1% |
| 12 cylinders | 7 | 0.1% |
| (Missing) | 4248 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| cylinders | 6014 | |
| 6 | 2296 | 19.0% |
| 4 | 1842 | 15.3% |
| 8 | 1773 | 14.7% |
| 5 | 45 | 0.4% |
| 10 | 40 | 0.3% |
| other | 34 | 0.3% |
| 3 | 11 | 0.1% |
| 12 | 7 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 6048 | |
| r | 6048 | |
| c | 6014 | |
| y | 6014 | |
| 6014 | ||
| d | 6014 | |
| l | 6014 | |
| n | 6014 | |
| i | 6014 | |
| s | 6014 | |
| Other values (11) | 6163 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 66371 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 6048 | |
| r | 6048 | |
| c | 6014 | |
| y | 6014 | |
| 6014 | ||
| d | 6014 | |
| l | 6014 | |
| n | 6014 | |
| i | 6014 | |
| s | 6014 | |
| Other values (11) | 6163 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 66371 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 6048 | |
| r | 6048 | |
| c | 6014 | |
| y | 6014 | |
| 6014 | ||
| d | 6014 | |
| l | 6014 | |
| n | 6014 | |
| i | 6014 | |
| s | 6014 | |
| Other values (11) | 6163 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 66371 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 6048 | |
| r | 6048 | |
| c | 6014 | |
| y | 6014 | |
| 6014 | ||
| d | 6014 | |
| l | 6014 | |
| n | 6014 | |
| i | 6014 | |
| s | 6014 | |
| Other values (11) | 6163 |
drive
Categorical
HIGH CORRELATION  MISSING 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 3076 |
| Missing (%) | 29.9% |
| Memory size | 160.9 KiB |
| 4wd | |
|---|---|
| fwd | |
| rwd |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 21660 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | rwd |
|---|---|
| 2nd row | fwd |
| 3rd row | 4wd |
| 4th row | rwd |
| 5th row | fwd |
Common Values
| Value | Count | Frequency (%) |
| 4wd | 3175 | |
| fwd | 2494 | |
| rwd | 1551 | |
| (Missing) | 3076 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 4wd | 3175 | |
| fwd | 2494 | |
| rwd | 1551 |
Most occurring characters
| Value | Count | Frequency (%) |
| w | 7220 | |
| d | 7220 | |
| 4 | 3175 | |
| f | 2494 | 11.5% |
| r | 1551 | 7.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 21660 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| w | 7220 | |
| d | 7220 | |
| 4 | 3175 | |
| f | 2494 | 11.5% |
| r | 1551 | 7.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 21660 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| w | 7220 | |
| d | 7220 | |
| 4 | 3175 | |
| f | 2494 | 11.5% |
| r | 1551 | 7.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 21660 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| w | 7220 | |
| d | 7220 | |
| 4 | 3175 | |
| f | 2494 | 11.5% |
| r | 1551 | 7.2% |
tfidf_rear
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 2727 |
|---|---|
| Distinct (%) | 32.5% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.04352736 |
| Minimum | 0 |
|---|---|
| Maximum | 0.63290076 |
| Zeros | 5567 |
| Zeros (%) | 54.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.051563904 |
| 95-th percentile | 0.24335664 |
| Maximum | 0.63290076 |
| Range | 0.63290076 |
| Interquartile range (IQR) | 0.051563904 |
Descriptive statistics
| Standard deviation | 0.086448024 |
|---|---|
| Coefficient of variation (CV) | 1.9860617 |
| Kurtosis | 6.5669413 |
| Mean | 0.04352736 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.5146377 |
| Sum | 365.67335 |
| Variance | 0.0074732608 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5567 | |
| 0.01726722651 | 6 | 0.1% |
| 0.1886659891 | 4 | < 0.1% |
| 0.06341779012 | 4 | < 0.1% |
| 0.1575751949 | 3 | < 0.1% |
| 0.02833627755 | 3 | < 0.1% |
| 0.06340993549 | 3 | < 0.1% |
| 0.07218915027 | 3 | < 0.1% |
| 0.01576131688 | 3 | < 0.1% |
| 0.01767244463 | 3 | < 0.1% |
| Other values (2717) | 2802 | |
| (Missing) | 1895 | 18.4% |
| Value | Count | Frequency (%) |
| 0 | 5567 | |
| 0.005045672688 | 1 | < 0.1% |
| 0.005184678231 | 1 | < 0.1% |
| 0.005235324864 | 1 | < 0.1% |
| 0.005251241453 | 1 | < 0.1% |
| 0.005262198685 | 1 | < 0.1% |
| 0.005273931063 | 1 | < 0.1% |
| 0.005283370821 | 1 | < 0.1% |
| 0.005308737512 | 1 | < 0.1% |
| 0.00531279259 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.6329007625 | 1 | |
| 0.5939449776 | 1 | |
| 0.5895443466 | 1 | |
| 0.5637942842 | 1 | |
| 0.5424831746 | 1 | |
| 0.5301955365 | 1 | |
| 0.5243134798 | 2 | |
| 0.516264286 | 1 | |
| 0.5146175047 | 1 | |
| 0.5010833253 | 1 |
odometer
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 7258 |
|---|---|
| Distinct (%) | 70.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 89672.649 |
| Minimum | 0 |
|---|---|
| Maximum | 299200 |
| Zeros | 21 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 7134 |
| Q1 | 36386 |
| median | 83813.5 |
| Q3 | 133000 |
| 95-th percentile | 200000 |
| Maximum | 299200 |
| Range | 299200 |
| Interquartile range (IQR) | 96614 |
Descriptive statistics
| Standard deviation | 61158.144 |
|---|---|
| Coefficient of variation (CV) | 0.68201558 |
| Kurtosis | -0.33728644 |
| Mean | 89672.649 |
| Median Absolute Deviation (MAD) | 48102 |
| Skewness | 0.540888 |
| Sum | 9.2326959 × 108 |
| Variance | 3.7403185 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 43 | 0.4% |
| 100000 | 37 | 0.4% |
| 200000 | 36 | 0.3% |
| 140000 | 35 | 0.3% |
| 150000 | 35 | 0.3% |
| 160000 | 32 | 0.3% |
| 170000 | 29 | 0.3% |
| 130000 | 27 | 0.3% |
| 180000 | 25 | 0.2% |
| 120000 | 24 | 0.2% |
| Other values (7248) | 9973 |
| Value | Count | Frequency (%) |
| 0 | 21 | |
| 1 | 43 | |
| 2 | 4 | < 0.1% |
| 3 | 4 | < 0.1% |
| 4 | 1 | < 0.1% |
| 5 | 3 | < 0.1% |
| 7 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| 10 | 3 | < 0.1% |
| 13 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 299200 | 1 | |
| 298813 | 1 | |
| 298000 | 1 | |
| 297600 | 1 | |
| 297000 | 1 | |
| 296062 | 1 | |
| 296000 | 1 | |
| 295000 | 2 | |
| 293000 | 1 | |
| 292300 | 1 |
paint_color
Categorical
MISSING 
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 2867 |
| Missing (%) | 27.8% |
| Memory size | 160.9 KiB |
| white | |
|---|---|
| black | |
| silver | |
| red | |
| blue | |
| Other values (7) |
Length
| Max length | 6 |
|---|---|
| Median length | 5 |
| Mean length | 4.7931081 |
| Min length | 3 |
Characters and Unicode
| Total characters | 35608 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | white |
|---|---|
| 2nd row | black |
| 3rd row | white |
| 4th row | silver |
| 5th row | red |
Common Values
| Value | Count | Frequency (%) |
| white | 1950 | |
| black | 1605 | |
| silver | 1109 | 10.8% |
| red | 797 | 7.7% |
| blue | 780 | 7.6% |
| grey | 565 | 5.5% |
| green | 184 | 1.8% |
| custom | 171 | 1.7% |
| brown | 146 | 1.4% |
| orange | 53 | 0.5% |
| Other values (2) | 69 | 0.7% |
| (Missing) | 2867 |
Length
| Value | Count | Frequency (%) |
| white | 1950 | |
| black | 1605 | |
| silver | 1109 | |
| red | 797 | |
| blue | 780 | 10.5% |
| grey | 565 | 7.6% |
| green | 184 | 2.5% |
| custom | 171 | 2.3% |
| brown | 146 | 2.0% |
| orange | 53 | 0.7% |
| Other values (2) | 69 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 5691 | |
| l | 3616 | |
| i | 3059 | 8.6% |
| r | 2870 | 8.1% |
| b | 2531 | 7.1% |
| w | 2149 | 6.0% |
| t | 2121 | 6.0% |
| h | 1950 | 5.5% |
| c | 1776 | 5.0% |
| a | 1658 | 4.7% |
| Other values (11) | 8187 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 35608 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 5691 | |
| l | 3616 | |
| i | 3059 | 8.6% |
| r | 2870 | 8.1% |
| b | 2531 | 7.1% |
| w | 2149 | 6.0% |
| t | 2121 | 6.0% |
| h | 1950 | 5.5% |
| c | 1776 | 5.0% |
| a | 1658 | 4.7% |
| Other values (11) | 8187 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 35608 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 5691 | |
| l | 3616 | |
| i | 3059 | 8.6% |
| r | 2870 | 8.1% |
| b | 2531 | 7.1% |
| w | 2149 | 6.0% |
| t | 2121 | 6.0% |
| h | 1950 | 5.5% |
| c | 1776 | 5.0% |
| a | 1658 | 4.7% |
| Other values (11) | 8187 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 35608 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 5691 | |
| l | 3616 | |
| i | 3059 | 8.6% |
| r | 2870 | 8.1% |
| b | 2531 | 7.1% |
| w | 2149 | 6.0% |
| t | 2121 | 6.0% |
| h | 1950 | 5.5% |
| c | 1776 | 5.0% |
| a | 1658 | 4.7% |
| Other values (11) | 8187 |
fuel
Categorical
IMBALANCE 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 60 |
| Missing (%) | 0.6% |
| Memory size | 160.9 KiB |
| gas | |
|---|---|
| other | 789 |
| diesel | 699 |
| hybrid | 143 |
| electric | 47 |
Length
| Max length | 8 |
|---|---|
| Median length | 3 |
| Mean length | 3.4238961 |
| Min length | 3 |
Characters and Unicode
| Total characters | 35047 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | gas |
|---|---|
| 2nd row | other |
| 3rd row | other |
| 4th row | diesel |
| 5th row | gas |
Common Values
| Value | Count | Frequency (%) |
| gas | 8558 | |
| other | 789 | 7.7% |
| diesel | 699 | 6.8% |
| hybrid | 143 | 1.4% |
| electric | 47 | 0.5% |
| (Missing) | 60 | 0.6% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| gas | 8558 | |
| other | 789 | 7.7% |
| diesel | 699 | 6.8% |
| hybrid | 143 | 1.4% |
| electric | 47 | 0.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| s | 9257 | |
| g | 8558 | |
| a | 8558 | |
| e | 2281 | 6.5% |
| r | 979 | 2.8% |
| h | 932 | 2.7% |
| i | 889 | 2.5% |
| d | 842 | 2.4% |
| t | 836 | 2.4% |
| o | 789 | 2.3% |
| Other values (4) | 1126 | 3.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 35047 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| s | 9257 | |
| g | 8558 | |
| a | 8558 | |
| e | 2281 | 6.5% |
| r | 979 | 2.8% |
| h | 932 | 2.7% |
| i | 889 | 2.5% |
| d | 842 | 2.4% |
| t | 836 | 2.4% |
| o | 789 | 2.3% |
| Other values (4) | 1126 | 3.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 35047 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| s | 9257 | |
| g | 8558 | |
| a | 8558 | |
| e | 2281 | 6.5% |
| r | 979 | 2.8% |
| h | 932 | 2.7% |
| i | 889 | 2.5% |
| d | 842 | 2.4% |
| t | 836 | 2.4% |
| o | 789 | 2.3% |
| Other values (4) | 1126 | 3.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 35047 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| s | 9257 | |
| g | 8558 | |
| a | 8558 | |
| e | 2281 | 6.5% |
| r | 979 | 2.8% |
| h | 932 | 2.7% |
| i | 889 | 2.5% |
| d | 842 | 2.4% |
| t | 836 | 2.4% |
| o | 789 | 2.3% |
| Other values (4) | 1126 | 3.2% |
long
Real number (ℝ)
MISSING 
| Distinct | 5075 |
|---|---|
| Distinct (%) | 49.8% |
| Missing | 111 |
| Missing (%) | 1.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -94.217343 |
| Minimum | -159.38468 |
|---|---|
| Maximum | 167.62991 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 10183 |
| Negative (%) | 98.9% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | -159.38468 |
|---|---|
| 5-th percentile | -122.44918 |
| Q1 | -110.96 |
| median | -87.924 |
| Q3 | -80.8419 |
| 95-th percentile | -72.999092 |
| Maximum | 167.62991 |
| Range | 327.01459 |
| Interquartile range (IQR) | 30.1181 |
Descriptive statistics
| Standard deviation | 18.184987 |
|---|---|
| Coefficient of variation (CV) | -0.19301103 |
| Kurtosis | 5.1282479 |
| Mean | -94.217343 |
| Median Absolute Deviation (MAD) | 10.0517 |
| Skewness | -0.37995077 |
| Sum | -959603.63 |
| Variance | 330.69374 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -84.411811 | 101 | 1.0% |
| -74.281707 | 92 | 0.9% |
| -84.1122 | 79 | 0.8% |
| -84.4454 | 77 | 0.7% |
| -82.48229 | 35 | 0.3% |
| -122.3151 | 34 | 0.3% |
| -119.128015 | 33 | 0.3% |
| -116.781406 | 33 | 0.3% |
| -122.32164 | 32 | 0.3% |
| -117.236949 | 25 | 0.2% |
| Other values (5065) | 9644 | |
| (Missing) | 111 | 1.1% |
| Value | Count | Frequency (%) |
| -159.384676 | 1 | |
| -159.3448 | 1 | |
| -158.030906 | 2 | |
| -158.02241 | 1 | |
| -158.0124 | 2 | |
| -158.00528 | 1 | |
| -157.9269 | 1 | |
| -157.903554 | 1 | |
| -157.902016 | 1 | |
| -157.900292 | 2 |
| Value | Count | Frequency (%) |
| 167.629911 | 1 | |
| 94.1632 | 1 | |
| -67.84049 | 1 | |
| -68.7778 | 1 | |
| -68.805028 | 2 | |
| -68.856 | 1 | |
| -69.462948 | 1 | |
| -69.6826 | 1 | |
| -70.112543 | 2 | |
| -70.169672 | 2 |
manufacturer
Categorical
HIGH CORRELATION  MISSING 
| Distinct | 10 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 3505 |
| Missing (%) | 34.0% |
| Memory size | 160.9 KiB |
| ford | |
|---|---|
| chevrolet | |
| toyota | |
| honda | |
| jeep | |
| Other values (5) |
Length
| Max length | 9 |
|---|---|
| Median length | 6 |
| Mean length | 5.2831689 |
| Min length | 3 |
Characters and Unicode
| Total characters | 35878 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | bmw |
|---|---|
| 2nd row | ford |
| 3rd row | toyota |
| 4th row | jeep |
| 5th row | ford |
Common Values
| Value | Count | Frequency (%) |
| ford | 1784 | |
| chevrolet | 1305 | 12.7% |
| toyota | 864 | 8.4% |
| honda | 512 | 5.0% |
| jeep | 430 | 4.2% |
| ram | 422 | 4.1% |
| nissan | 419 | 4.1% |
| gmc | 403 | 3.9% |
| bmw | 358 | 3.5% |
| dodge | 294 | 2.9% |
| (Missing) | 3505 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| ford | 1784 | |
| chevrolet | 1305 | |
| toyota | 864 | |
| honda | 512 | 7.5% |
| jeep | 430 | 6.3% |
| ram | 422 | 6.2% |
| nissan | 419 | 6.2% |
| gmc | 403 | 5.9% |
| bmw | 358 | 5.3% |
| dodge | 294 | 4.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 5623 | |
| e | 3764 | |
| r | 3511 | |
| t | 3033 | 8.5% |
| d | 2884 | 8.0% |
| a | 2217 | 6.2% |
| h | 1817 | 5.1% |
| f | 1784 | 5.0% |
| c | 1708 | 4.8% |
| n | 1350 | 3.8% |
| Other values (11) | 8187 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 35878 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| o | 5623 | |
| e | 3764 | |
| r | 3511 | |
| t | 3033 | 8.5% |
| d | 2884 | 8.0% |
| a | 2217 | 6.2% |
| h | 1817 | 5.1% |
| f | 1784 | 5.0% |
| c | 1708 | 4.8% |
| n | 1350 | 3.8% |
| Other values (11) | 8187 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 35878 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| o | 5623 | |
| e | 3764 | |
| r | 3511 | |
| t | 3033 | 8.5% |
| d | 2884 | 8.0% |
| a | 2217 | 6.2% |
| h | 1817 | 5.1% |
| f | 1784 | 5.0% |
| c | 1708 | 4.8% |
| n | 1350 | 3.8% |
| Other values (11) | 8187 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 35878 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| o | 5623 | |
| e | 3764 | |
| r | 3511 | |
| t | 3033 | 8.5% |
| d | 2884 | 8.0% |
| a | 2217 | 6.2% |
| h | 1817 | 5.1% |
| f | 1784 | 5.0% |
| c | 1708 | 4.8% |
| n | 1350 | 3.8% |
| Other values (11) | 8187 |
region
Text
| Distinct | 392 |
|---|---|
| Distinct (%) | 3.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 160.9 KiB |
Length
| Max length | 26 |
|---|---|
| Median length | 20 |
| Mean length | 11.504856 |
| Min length | 4 |
Characters and Unicode
| Total characters | 118454 |
|---|---|
| Distinct characters | 52 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 10 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | fredericksburg |
|---|---|
| 2nd row | holland |
| 3rd row | lawrence |
| 4th row | phoenix |
| 5th row | san luis obispo |
| Value | Count | Frequency (%) |
| 1587 | 8.7% | |
| city | 317 | 1.7% |
| st | 216 | 1.2% |
| new | 213 | 1.2% |
| bay | 206 | 1.1% |
| san | 189 | 1.0% |
| south | 189 | 1.0% |
| county | 170 | 0.9% |
| jersey | 163 | 0.9% |
| central | 163 | 0.9% |
| Other values (479) | 14782 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 11691 | 9.9% |
| e | 9864 | 8.3% |
| o | 8879 | 7.5% |
| n | 8536 | 7.2% |
| 7899 | 6.7% | |
| s | 7706 | 6.5% |
| l | 7200 | 6.1% |
| r | 7019 | 5.9% |
| t | 7009 | 5.9% |
| i | 6612 | 5.6% |
| Other values (42) | 36039 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 118454 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 11691 | 9.9% |
| e | 9864 | 8.3% |
| o | 8879 | 7.5% |
| n | 8536 | 7.2% |
| 7899 | 6.7% | |
| s | 7706 | 6.5% |
| l | 7200 | 6.1% |
| r | 7019 | 5.9% |
| t | 7009 | 5.9% |
| i | 6612 | 5.6% |
| Other values (42) | 36039 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 118454 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 11691 | 9.9% |
| e | 9864 | 8.3% |
| o | 8879 | 7.5% |
| n | 8536 | 7.2% |
| 7899 | 6.7% | |
| s | 7706 | 6.5% |
| l | 7200 | 6.1% |
| r | 7019 | 5.9% |
| t | 7009 | 5.9% |
| i | 6612 | 5.6% |
| Other values (42) | 36039 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 118454 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 11691 | 9.9% |
| e | 9864 | 8.3% |
| o | 8879 | 7.5% |
| n | 8536 | 7.2% |
| 7899 | 6.7% | |
| s | 7706 | 6.5% |
| l | 7200 | 6.1% |
| r | 7019 | 5.9% |
| t | 7009 | 5.9% |
| i | 6612 | 5.6% |
| Other values (42) | 36039 |
tfidf_car
Real number (ℝ)
HIGH CORRELATION  MISSING  ZEROS 
| Distinct | 3077 |
|---|---|
| Distinct (%) | 36.6% |
| Missing | 1895 |
| Missing (%) | 18.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.042918373 |
| Minimum | 0 |
|---|---|
| Maximum | 0.78660514 |
| Zeros | 5227 |
| Zeros (%) | 50.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.044508372 |
| 95-th percentile | 0.22920906 |
| Maximum | 0.78660514 |
| Range | 0.78660514 |
| Interquartile range (IQR) | 0.044508372 |
Descriptive statistics
| Standard deviation | 0.089204078 |
|---|---|
| Coefficient of variation (CV) | 2.078459 |
| Kurtosis | 11.247861 |
| Mean | 0.042918373 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.0882239 |
| Sum | 360.55725 |
| Variance | 0.0079573676 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5227 | |
| 0.0601891201 | 4 | < 0.1% |
| 0.1119129965 | 4 | < 0.1% |
| 0.1246509489 | 3 | < 0.1% |
| 0.02836665501 | 3 | < 0.1% |
| 0.1445830141 | 3 | < 0.1% |
| 0.04499993186 | 3 | < 0.1% |
| 0.03181529492 | 3 | < 0.1% |
| 0.2129619994 | 3 | < 0.1% |
| 0.1522412458 | 3 | < 0.1% |
| Other values (3067) | 3145 | |
| (Missing) | 1895 | 18.4% |
| Value | Count | Frequency (%) |
| 0 | 5227 | |
| 0.005014388554 | 1 | < 0.1% |
| 0.005051061027 | 1 | < 0.1% |
| 0.005076840055 | 1 | < 0.1% |
| 0.005855108942 | 1 | < 0.1% |
| 0.006321366936 | 1 | < 0.1% |
| 0.00668286744 | 2 | < 0.1% |
| 0.006757677952 | 2 | < 0.1% |
| 0.006771362061 | 1 | < 0.1% |
| 0.006809621865 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.7866051387 | 1 | |
| 0.6972666432 | 1 | |
| 0.6649944034 | 1 | |
| 0.6543392397 | 1 | |
| 0.6285982605 | 1 | |
| 0.6256799014 | 1 | |
| 0.6219419389 | 1 | |
| 0.6168228805 | 1 | |
| 0.5967558639 | 1 | |
| 0.5932451852 | 1 |
description_exists
Boolean
CONSTANT 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 90.5 KiB |
| True |
|---|
| Value | Count | Frequency (%) |
| True | 10296 |
| carvana_ad | condition | cylinders | drive | fuel | lat | long | manufacturer | odometer | org_manuf | paint_color | price | tfidf_auto | tfidf_car | tfidf_credit | tfidf_miles | tfidf_new | tfidf_power | tfidf_rear | tfidf_text | tfidf_truck | tfidf_vehicle | title_status | transmission | type | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| carvana_ad | 1.000 | 0.617 | 0.277 | 0.161 | 0.403 | 0.187 | 0.159 | 0.133 | 0.525 | 0.277 | 0.185 | 0.457 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.084 | 0.878 | 0.360 | 0.283 |
| condition | 0.617 | 1.000 | 0.098 | 0.096 | 0.151 | 0.061 | 0.075 | 0.055 | 0.200 | 0.088 | 0.081 | 0.171 | 0.030 | 0.068 | 0.096 | 0.067 | 0.081 | 0.042 | 0.000 | 0.023 | 0.107 | 0.059 | 0.163 | 0.399 | 0.146 | 0.122 |
| cylinders | 0.277 | 0.098 | 1.000 | 0.388 | 0.189 | 0.028 | 0.037 | 0.211 | 0.067 | 0.320 | 0.105 | 0.144 | 0.000 | 0.046 | 0.039 | 0.066 | 0.000 | 0.017 | 0.057 | 0.033 | 0.128 | 0.042 | 0.059 | 0.174 | 0.248 | 0.101 |
| drive | 0.161 | 0.096 | 0.388 | 1.000 | 0.161 | 0.158 | 0.041 | 0.394 | 0.105 | 0.458 | 0.116 | 0.258 | 0.070 | 0.116 | 0.061 | 0.069 | 0.041 | 0.068 | 0.065 | 0.035 | 0.190 | 0.085 | 0.041 | 0.132 | 0.557 | 0.183 |
| fuel | 0.403 | 0.151 | 0.189 | 0.161 | 1.000 | 0.044 | 0.030 | 0.183 | 0.130 | 0.336 | 0.083 | 0.193 | 0.031 | 0.047 | 0.040 | 0.028 | 0.000 | 0.014 | 0.023 | 0.034 | 0.147 | 0.031 | 0.031 | 0.266 | 0.238 | 0.081 |
| lat | 0.187 | 0.061 | 0.028 | 0.158 | 0.044 | 1.000 | -0.025 | 0.041 | 0.067 | 0.046 | 0.031 | -0.043 | 0.081 | -0.028 | -0.033 | -0.037 | -0.028 | 0.068 | 0.089 | 0.020 | 0.015 | 0.124 | 0.000 | 0.115 | 0.077 | -0.067 |
| long | 0.159 | 0.075 | 0.037 | 0.041 | 0.030 | -0.025 | 1.000 | 0.055 | 0.012 | 0.060 | 0.012 | -0.080 | -0.055 | -0.012 | -0.124 | 0.014 | 0.045 | -0.064 | -0.091 | -0.089 | -0.012 | -0.125 | 0.016 | 0.088 | 0.041 | -0.019 |
| manufacturer | 0.133 | 0.055 | 0.211 | 0.394 | 0.183 | 0.041 | 0.055 | 1.000 | 0.051 | 1.000 | 0.095 | 0.101 | 0.022 | 0.053 | 0.032 | 0.036 | 0.014 | 0.031 | 0.037 | 0.025 | 0.096 | 0.039 | 0.031 | 0.098 | 0.255 | 0.062 |
| odometer | 0.525 | 0.200 | 0.067 | 0.105 | 0.130 | 0.067 | 0.012 | 0.051 | 1.000 | 0.085 | 0.051 | -0.611 | -0.031 | -0.079 | -0.097 | -0.027 | -0.014 | -0.065 | -0.158 | -0.071 | 0.021 | -0.188 | 0.016 | 0.350 | 0.092 | -0.646 |
| org_manuf | 0.277 | 0.088 | 0.320 | 0.458 | 0.336 | 0.046 | 0.060 | 1.000 | 0.085 | 1.000 | 0.110 | 0.239 | 0.000 | 0.083 | 0.052 | 0.049 | 0.035 | 0.041 | 0.051 | 0.048 | 0.116 | 0.031 | 0.021 | 0.193 | 0.267 | 0.099 |
| paint_color | 0.185 | 0.081 | 0.105 | 0.116 | 0.083 | 0.031 | 0.012 | 0.095 | 0.051 | 0.110 | 1.000 | 0.071 | 0.013 | 0.036 | 0.032 | 0.055 | 0.038 | 0.037 | 0.019 | 0.041 | 0.070 | 0.060 | 0.014 | 0.136 | 0.092 | 0.089 |
| price | 0.457 | 0.171 | 0.144 | 0.258 | 0.193 | -0.043 | -0.080 | 0.101 | -0.611 | 0.239 | 0.071 | 1.000 | 0.072 | -0.043 | 0.236 | -0.080 | -0.005 | 0.165 | 0.257 | 0.124 | 0.304 | 0.285 | 0.036 | 0.300 | 0.147 | 0.680 |
| tfidf_auto | 1.000 | 0.030 | 0.000 | 0.070 | 0.031 | 0.081 | -0.055 | 0.022 | -0.031 | 0.000 | 0.013 | 0.072 | 1.000 | 0.062 | 0.273 | -0.112 | -0.098 | 0.121 | 0.115 | 0.172 | -0.009 | 0.232 | 0.000 | 0.045 | 0.037 | 0.143 |
| tfidf_car | 1.000 | 0.068 | 0.046 | 0.116 | 0.047 | -0.028 | -0.012 | 0.053 | -0.079 | 0.083 | 0.036 | -0.043 | 0.062 | 1.000 | 0.184 | 0.046 | 0.157 | -0.063 | 0.015 | 0.094 | -0.061 | 0.118 | 0.079 | 0.063 | 0.100 | 0.004 |
| tfidf_credit | 1.000 | 0.096 | 0.039 | 0.061 | 0.040 | -0.033 | -0.124 | 0.032 | -0.097 | 0.052 | 0.032 | 0.236 | 0.273 | 0.184 | 1.000 | -0.096 | -0.002 | 0.021 | 0.057 | 0.311 | 0.082 | 0.414 | 0.039 | 0.080 | 0.044 | 0.255 |
| tfidf_miles | 1.000 | 0.067 | 0.066 | 0.069 | 0.028 | -0.037 | 0.014 | 0.036 | -0.027 | 0.049 | 0.055 | -0.080 | -0.112 | 0.046 | -0.096 | 1.000 | 0.172 | 0.004 | -0.080 | -0.108 | 0.004 | -0.106 | 0.039 | 0.052 | 0.041 | -0.064 |
| tfidf_new | 1.000 | 0.081 | 0.000 | 0.041 | 0.000 | -0.028 | 0.045 | 0.014 | -0.014 | 0.035 | 0.038 | -0.005 | -0.098 | 0.157 | -0.002 | 0.172 | 1.000 | -0.074 | -0.012 | -0.115 | 0.092 | -0.043 | 0.017 | 0.098 | 0.045 | -0.095 |
| tfidf_power | 1.000 | 0.042 | 0.017 | 0.068 | 0.014 | 0.068 | -0.064 | 0.031 | -0.065 | 0.041 | 0.037 | 0.165 | 0.121 | -0.063 | 0.021 | 0.004 | -0.074 | 1.000 | 0.393 | -0.012 | 0.065 | 0.154 | 0.022 | 0.050 | 0.054 | 0.157 |
| tfidf_rear | 1.000 | 0.000 | 0.057 | 0.065 | 0.023 | 0.089 | -0.091 | 0.037 | -0.158 | 0.051 | 0.019 | 0.257 | 0.115 | 0.015 | 0.057 | -0.080 | -0.012 | 0.393 | 1.000 | 0.054 | 0.054 | 0.234 | 0.000 | 0.031 | 0.056 | 0.205 |
| tfidf_text | 1.000 | 0.023 | 0.033 | 0.035 | 0.034 | 0.020 | -0.089 | 0.025 | -0.071 | 0.048 | 0.041 | 0.124 | 0.172 | 0.094 | 0.311 | -0.108 | -0.115 | -0.012 | 0.054 | 1.000 | 0.021 | 0.230 | 0.043 | 0.039 | 0.035 | 0.148 |
| tfidf_truck | 1.000 | 0.107 | 0.128 | 0.190 | 0.147 | 0.015 | -0.012 | 0.096 | 0.021 | 0.116 | 0.070 | 0.304 | -0.009 | -0.061 | 0.082 | 0.004 | 0.092 | 0.065 | 0.054 | 0.021 | 1.000 | 0.091 | 0.008 | 0.030 | 0.174 | 0.041 |
| tfidf_vehicle | 1.000 | 0.059 | 0.042 | 0.085 | 0.031 | 0.124 | -0.125 | 0.039 | -0.188 | 0.031 | 0.060 | 0.285 | 0.232 | 0.118 | 0.414 | -0.106 | -0.043 | 0.154 | 0.234 | 0.230 | 0.091 | 1.000 | 0.000 | 0.055 | 0.065 | 0.307 |
| title_status | 0.084 | 0.163 | 0.059 | 0.041 | 0.031 | 0.000 | 0.016 | 0.031 | 0.016 | 0.021 | 0.014 | 0.036 | 0.000 | 0.079 | 0.039 | 0.039 | 0.017 | 0.022 | 0.000 | 0.043 | 0.008 | 0.000 | 1.000 | 0.062 | 0.029 | 0.140 |
| transmission | 0.878 | 0.399 | 0.174 | 0.132 | 0.266 | 0.115 | 0.088 | 0.098 | 0.350 | 0.193 | 0.136 | 0.300 | 0.045 | 0.063 | 0.080 | 0.052 | 0.098 | 0.050 | 0.031 | 0.039 | 0.030 | 0.055 | 0.062 | 1.000 | 0.299 | 0.277 |
| type | 0.360 | 0.146 | 0.248 | 0.557 | 0.238 | 0.077 | 0.041 | 0.255 | 0.092 | 0.267 | 0.092 | 0.147 | 0.037 | 0.100 | 0.044 | 0.041 | 0.045 | 0.054 | 0.056 | 0.035 | 0.174 | 0.065 | 0.029 | 0.299 | 1.000 | 0.103 |
| year | 0.283 | 0.122 | 0.101 | 0.183 | 0.081 | -0.067 | -0.019 | 0.062 | -0.646 | 0.099 | 0.089 | 0.680 | 0.143 | 0.004 | 0.255 | -0.064 | -0.095 | 0.157 | 0.205 | 0.148 | 0.041 | 0.307 | 0.140 | 0.277 | 0.103 | 1.000 |
| tfidf_auto | price | model | tfidf_miles | condition | state | tfidf_power | tfidf_new | title_status | tfidf_vehicle | lat | transmission | type | tfidf_text | org_manuf | year | tfidf_truck | tfidf_credit | carvana_ad | cylinders | drive | tfidf_rear | odometer | paint_color | fuel | long | manufacturer | region | tfidf_car | description_exists | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 189846 | 0.028732 | 12989 | 3 series | 0.023903 | None | va | 0.0 | 0.056003 | clean | 0.022641 | NaN | automatic | convertible | 0.079128 | bmw | 2011.0 | 0.0 | 0.280134 | False | 6 cylinders | rwd | 0.029465 | 99178 | white | gas | NaN | bmw | fredericksburg | 0.055929 | True |
| 306451 | NaN | 35990 | ranger supercrew lariat | NaN | -1 | mi | NaN | NaN | clean | NaN | 42.770000 | other | pickup | NaN | ford | 2020.0 | NaN | NaN | True | None | None | NaN | 4250 | black | other | -86.100000 | ford | holland | NaN | True |
| 282208 | NaN | 31990 | romeo stelvio ti sport | NaN | -1 | ks | NaN | NaN | clean | NaN | 38.960000 | other | hatchback | NaN | alfa-romeo | 2018.0 | NaN | NaN | True | None | None | NaN | 24008 | white | other | -95.250000 | None | lawrence | NaN | True |
| 240396 | 0.000000 | 34500 | s320 | 0.000000 | None | az | 0.0 | 0.000000 | clean | 0.000000 | 34.365500 | automatic | None | 0.000000 | mercedes-benz | 1998.0 | 0.0 | 0.000000 | False | None | None | 0.000000 | 100000 | None | diesel | -112.129600 | None | phoenix | 0.000000 | True |
| 152941 | 0.230520 | 3995 | corolla | 0.095889 | 1 | ca | 0.0 | 0.000000 | clean | 0.000000 | 35.122159 | automatic | None | 0.158715 | toyota | 2007.0 | 0.0 | 0.056189 | False | None | fwd | 0.000000 | 253148 | silver | gas | -120.626169 | toyota | san luis obispo | 0.056091 | True |
| 23838 | 0.000000 | 5000 | grand cherokee | 0.000000 | -1 | fl | 0.0 | 0.000000 | clean | 0.000000 | 30.421200 | automatic | SUV | 0.000000 | jeep | 2005.0 | 0.0 | 0.000000 | False | 8 cylinders | 4wd | 0.204941 | 125000 | red | gas | -86.892600 | jeep | pensacola | 0.000000 | True |
| 192760 | 0.000000 | 6500 | mustang convertible | 0.159649 | 1 | tx | 0.0 | 0.000000 | clean | 0.000000 | 33.334000 | manual | convertible | 0.000000 | ford | 2002.0 | 0.0 | 0.000000 | False | 8 cylinders | rwd | 0.000000 | 140000 | black | gas | -96.750000 | ford | dallas / fort worth | 0.000000 | True |
| 119587 | 0.000000 | 7905 | versa | 0.000000 | None | nv | 0.0 | 0.000000 | clean | 0.000000 | 36.143249 | automatic | hatchback | 0.000000 | nissan | 2008.0 | 0.0 | 0.000000 | False | 4 cylinders | fwd | 0.000000 | 28954 | black | gas | -115.226909 | nissan | las vegas | 0.000000 | True |
| 236132 | 0.000000 | 2500 | tahoe | 0.171397 | None | ar | 0.0 | 0.000000 | salvage | 0.000000 | 35.093700 | automatic | None | 0.189131 | chevrolet | 2004.0 | 0.0 | 0.000000 | False | None | None | 0.000000 | 187000 | None | gas | -91.907400 | chevrolet | little rock | 0.000000 | True |
| 24098 | 0.000000 | 8950 | murano | 0.000000 | -1 | ny | 0.0 | 0.000000 | clean | 0.000000 | 41.187900 | automatic | SUV | 0.000000 | nissan | 2009.0 | 0.0 | 0.000000 | False | 4 cylinders | fwd | 0.000000 | 94075 | silver | gas | -73.167700 | nissan | new york city | 0.697267 | True |
| tfidf_auto | price | model | tfidf_miles | condition | state | tfidf_power | tfidf_new | title_status | tfidf_vehicle | lat | transmission | type | tfidf_text | org_manuf | year | tfidf_truck | tfidf_credit | carvana_ad | cylinders | drive | tfidf_rear | odometer | paint_color | fuel | long | manufacturer | region | tfidf_car | description_exists | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 199791 | 0.000000 | 12514 | soul | 0.024247 | None | ga | 0.000000 | 0.028404 | clean | 0.022967 | NaN | automatic | wagon | 0.080266 | kia | 2016.0 | 0.0 | 0.284162 | False | 4 cylinders | fwd | 0.029888 | 89291 | black | gas | NaN | None | columbus | 0.028367 | True |
| 297102 | NaN | 17990 | sonata se sedan 4d | NaN | -1 | sc | NaN | NaN | clean | NaN | 32.780000 | other | sedan | NaN | hyundai | 2018.0 | NaN | NaN | True | None | fwd | NaN | 25065 | silver | gas | -79.990000 | None | charleston | NaN | True |
| 97201 | 0.000000 | 7990 | murano | 0.000000 | 1 | ny | 0.000000 | 0.041311 | clean | 0.022269 | 40.859538 | automatic | SUV | 0.064856 | nissan | 2010.0 | 0.0 | 0.110210 | False | None | 4wd | 0.000000 | 114955 | None | gas | -73.075599 | nissan | long island | 0.041257 | True |
| 164202 | 0.000000 | 6900 | 1998 Jep Wrangler | 0.000000 | -1 | va | 0.000000 | 0.200189 | clean | 0.000000 | 37.125600 | manual | SUV | 0.000000 | None | 1998.0 | 0.0 | 0.000000 | False | 4 cylinders | 4wd | 0.000000 | 185820 | red | gas | -76.446900 | None | norfolk / hampton roads | 0.000000 | True |
| 166630 | 0.000000 | 19995 | gx 470 | 0.007926 | 1 | va | 0.000000 | 0.018571 | clean | 0.015016 | 36.917094 | automatic | SUV | 0.043732 | lexus | 2007.0 | 0.0 | 0.055736 | False | None | 4wd | 0.000000 | 93954 | white | gas | -76.232240 | None | norfolk / hampton roads | 0.009273 | True |
| 225663 | 0.000000 | 4500 | a6 3.2 quattro | 0.000000 | 1 | ia | 0.361537 | 0.158313 | clean | 0.000000 | 41.729432 | automatic | sedan | 0.074562 | audi | 2005.0 | 0.0 | 0.000000 | False | 6 cylinders | fwd | 0.083293 | 163600 | white | gas | -93.604582 | None | des moines | 0.000000 | True |
| 57062 | 0.201357 | 7495 | g35 | 0.000000 | None | oh | 0.000000 | 0.000000 | clean | 0.052892 | 39.355403 | automatic | sedan | 0.123232 | infiniti | 2008.0 | 0.0 | 0.000000 | False | None | 4wd | 0.000000 | 106704 | None | gas | -84.396202 | None | cincinnati | 0.000000 | True |
| 318902 | NaN | 29990 | tacoma double cab pickup | NaN | -1 | ca | NaN | NaN | clean | NaN | 33.779214 | other | pickup | NaN | toyota | 2012.0 | NaN | NaN | True | 6 cylinders | 4wd | NaN | 43182 | white | gas | -84.411811 | toyota | merced | NaN | True |
| 324054 | NaN | 12590 | spark ev 1lt hatchback | NaN | -1 | ca | NaN | NaN | clean | NaN | 36.600000 | other | hatchback | NaN | chevrolet | 2016.0 | NaN | NaN | True | None | fwd | NaN | 26063 | silver | electric | -121.880000 | chevrolet | monterey bay | NaN | True |
| 199450 | 0.000000 | 3200 | tribute | 0.089603 | -1 | mi | 0.000000 | 0.000000 | clean | 0.084874 | 42.982100 | automatic | SUV | 0.000000 | mazda | 2004.0 | 0.0 | 0.000000 | False | 6 cylinders | rwd | 0.000000 | 150000 | black | gas | -83.734000 | None | flint | 0.104828 | True |
Most frequently occurring
| tfidf_auto | price | model | tfidf_miles | condition | state | tfidf_power | tfidf_new | title_status | tfidf_vehicle | lat | transmission | type | tfidf_text | org_manuf | year | tfidf_truck | tfidf_credit | carvana_ad | cylinders | drive | tfidf_rear | odometer | paint_color | fuel | long | manufacturer | region | tfidf_car | description_exists | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8 | 0.0 | 23998 | renegade | 0.000000 | NaN | mt | 0.055054 | 0.0 | clean | 0.316759 | 47.798900 | automatic | SUV | 0.113541 | jeep | 2019.0 | 0.000000 | 0.030147 | False | 4 cylinders | 4wd | 0.063418 | 13120 | NaN | gas | -116.742300 | jeep | bozeman | 0.060189 | True | 3 |
| 0 | 0.0 | 6990 | accord | 0.000000 | NaN | fl | 0.000000 | 0.0 | clean | 0.147447 | 25.869874 | automatic | sedan | 0.128826 | honda | 2009.0 | 0.058758 | 0.045608 | False | NaN | NaN | 0.000000 | 128500 | green | gas | -80.242697 | honda | south florida | 0.182112 | True | 2 |
| 1 | 0.0 | 7900 | highlander | 0.000000 | -1 | wi | 0.076276 | 0.0 | clean | 0.135035 | 43.120500 | automatic | SUV | 0.000000 | toyota | 2009.0 | 0.000000 | 0.000000 | False | 6 cylinders | fwd | 0.000000 | 190524 | custom | gas | -89.352300 | toyota | madison | 0.000000 | True | 2 |
| 2 | 0.0 | 10899 | yukon xl | 0.000000 | NaN | oh | 0.000000 | 0.0 | clean | 0.086708 | 41.418454 | automatic | other | 0.101009 | gmc | 2007.0 | 0.069106 | 0.107280 | False | NaN | NaN | 0.000000 | 161585 | blue | gas | -81.720190 | gmc | akron / canton | 0.053546 | True | 2 |
| 3 | 0.0 | 14995 | a4 | 0.000000 | NaN | va | 0.000000 | 0.0 | clean | 0.000000 | 38.259970 | automatic | sedan | 0.174646 | audi | 2014.0 | 0.000000 | 0.139115 | False | NaN | NaN | 0.000000 | 78600 | black | gas | -77.493210 | NaN | fredericksburg | 0.046291 | True | 2 |
| 4 | 0.0 | 18998 | mazda6 | 0.000000 | NaN | mt | 0.059277 | 0.0 | clean | 0.288585 | 47.696062 | automatic | sedan | 0.122249 | mazda | 2016.0 | 0.000000 | 0.032460 | False | 4 cylinders | fwd | 0.068282 | 75890 | NaN | gas | -116.781406 | NaN | billings | 0.097209 | True | 2 |
| 5 | 0.0 | 20850 | transit t350 | 0.000000 | NaN | ok | 0.000000 | 0.0 | clean | 0.000000 | NaN | automatic | other | 0.000000 | ford | 2015.0 | 0.386626 | 0.000000 | False | NaN | NaN | 0.000000 | 169031 | black | diesel | NaN | ford | oklahoma city | 0.059915 | True | 2 |
| 6 | 0.0 | 20900 | silverado 1500 ltz 4x4 | 0.049531 | 2 | pa | 0.000000 | 0.0 | clean | 0.000000 | 41.135956 | automatic | pickup | 0.000000 | chevrolet | 2011.0 | 0.074786 | 0.058048 | False | 8 cylinders | NaN | 0.000000 | 110273 | grey | gas | -75.364945 | chevrolet | scranton / wilkes-barre | 0.057947 | True | 2 |
| 7 | 0.0 | 22777 | wrangler | 0.000000 | NaN | nh | 0.000000 | 0.0 | clean | 0.037063 | 43.066264 | automatic | SUV | 0.086354 | jeep | 2013.0 | 0.118159 | 0.045857 | False | NaN | 4wd | 0.000000 | 97822 | black | gas | -71.447000 | jeep | new hampshire | 0.000000 | True | 2 |
| 9 | 0.0 | 32999 | a8 | 0.000000 | NaN | nj | 0.000000 | 0.0 | clean | 0.000000 | 40.920150 | automatic | sedan | 0.135668 | audi | 2015.0 | 0.000000 | 0.144090 | False | NaN | NaN | 0.000000 | 40035 | NaN | gas | -74.193960 | NaN | north jersey | 0.191784 | True | 2 |